Goto

Collaborating Authors

 top code


LocallyHierarchicalAuto-RegressiveModelingfor ImageGeneration SupplementaryDocument

Neural Information Processing Systems

At the first epoch, learning rate is warmed up gradually fromlrinit = 1 10 5 to lrpeak. Figure A and B demonstrate the performances of the baseline and rejection sampling by varying hyperparameterssuchastop-k,softmaxtemperature,andacceptanceratio.Forthebaselinesampling in ImageNet, the hyperparameter setting withk = 2048 and temperaturet = 0.95 achieves the best FID performance in the small and medium models and the second-best performance in the large model. Figure C: Examples of reconstructed images using HQ-VAE with the learnable down-and upsampling layers. B.3 PredictionHeadTransformer(PHT) Wepropose locally hierarchical decoding inPHT contrary tothestandard sequential approach by assuming the conditional independence among bottom codes given a top code. We use pixel-shuffle and -unshuffle for resizing operations as illustrated in (a) while recursively quantizing hierarchical feature maps to acquire three-levelcodes--top,middle,andbottom.



Locally Hierarchical Auto-Regressive Modeling for Image Generation

Neural Information Processing Systems

We propose a locally hierarchical auto-regressive model with multiple resolutions of discrete codes. In the first stage of our algorithm, we represent an image with a pyramid of codes using Hierarchically Quantized Variational AutoEncoder (HQ-VAE), which disentangles the information contained in the multi-level codes. For an example of two-level codes, we create two separate pathways to carry high-level coarse structures of input images using top codes while compensating for missing fine details by constructing a residual connection for bottom codes. An appropriate selection of resizing operations for code embedding maps enables top codes to capture maximal information within images and the first stage algorithm achieves better performance on both vector quantization and image generation. The second stage adopts Hierarchically Quantized Transformer (HQ-Transformer) to process a sequence of local pyramids, which consist of a single top code and its corresponding bottom codes.




Locally Hierarchical Auto-Regressive Modeling for Image Generation

Neural Information Processing Systems

We propose a locally hierarchical auto-regressive model with multiple resolutions of discrete codes. In the first stage of our algorithm, we represent an image with a pyramid of codes using Hierarchically Quantized Variational AutoEncoder (HQ-VAE), which disentangles the information contained in the multi-level codes. For an example of two-level codes, we create two separate pathways to carry high-level coarse structures of input images using top codes while compensating for missing fine details by constructing a residual connection for bottom codes. An appropriate selection of resizing operations for code embedding maps enables top codes to capture maximal information within images and the first stage algorithm achieves better performance on both vector quantization and image generation. The second stage adopts Hierarchically Quantized Transformer (HQ-Transformer) to process a sequence of local pyramids, which consist of a single top code and its corresponding bottom codes.